Exploratory Analysis: FPCA for PFT Tundra with disturbances between 2015 and 2040


Patches which face a disturbance between the years 2015 and 2040 are the basis for this FPCA. Due to computational reasons, only one patch per grid cell is considered. As a first step in the exploratory analysis, the data is represented as functions by means of a b-spline basis. Several versions of order and penalized derivatives were analyzed, including the following setups:

Note that no linear differential operator was constructed so far. This could be a mean to improve the fit even further. Several values for the regularization parameter \(\lambda\) were tested for different settings and \(\lambda = 1\) is chosen by visual inspection and by avoiding a singular fit.

Basis Representation - Tundra

Figure 1 shows the chosen basis representation for Tundra with b-splines of order 6, penalizing the third derivative with a penalization parameter \(\lambda = 1\).

Figure 1: Smoothed basis representation for Tundra and all four scenarios. Here, the forth derivative of the 6-order-splines is penalized.
Figure 1: Smoothed basis representation for Tundra and all four scenarios. Here, the forth derivative of the 6-order-splines is penalized.

Note that the data pre-processing is constructed in such a way, that the curves are already aligned in the beginning of the process, so formally, no registration would be necessary. Nevertheless, a registration is applied later to further decrease the variability within the data.


Questions here:

  1. Is the chosen setting appropriate or maybe too flexible or too rigid (usually every curve should start at 0)?
  2. Could a linear differential operator be useful here?
  3. How can the fit be assessed?
  4. The fit is cut after 100 years in the plots, but for some curves, more data is available and used for fitting the curve. This was because this guarantees a more stable fit for year 100. Is this a good approach?

As a second step, data registration is considered. Unfortunately, the function landmarkreg (package fda) is not available, so no landmark registration is possible without self implementation. Thus, the curves are being registered with function register.fd which is not based on landmarks but on aligning with the mean function. As stated by Ramsay et al. (2009), this type of registration should usually be conducted after landmark registration. Figure 2 shows the registered curves for Tundra.

Figure 2: Smoothed basis representation for Tundra and all four scenarios with registration
Figure 2: Smoothed basis representation for Tundra and all four scenarios with registration

We can see that data registration carried out like this makes no sense since the target values are no longer in the appropriate range.

Questions here:

  1. Is registration really necessary here?
  2. If yes, should registration be conducted before running a FPCA?
  3. (What is happening in the control scenario here?)
  4. Should a landmark registration be applied as well?

Principal Component Analysis - Tundra

To further analyze the data, a FPCA is run for each of the four scenarios and each of the five PFTs separately. Again, let’s take a look at the two principal components for each scenario of PFT Tundra. Figure 3 shows the unregistered principal components.

Figure 3: First two principal components for each scenario for PFT Tundra
Figure 3: First two principal components for each scenario for PFT Tundra

Clearly, the principal components for the three climate scenarios show some similarities. High values in the first principal component reflect a higher peak in the share of above ground carbon as the mean, while lower values represent a lower peak and a more pronounced decrease in the years 30 to 40 after disturbance. The second principal component reflects again the size of the peak but focuses on a switch of the dynamics around 20 years after disturbance.

For the control scenario, the interpretation is different. Here, the variation is less focused on different heights of the peak, but more on the general behavior over the whole time period. Higher values of PC1 indicate a relative share of above ground carbon far higher than the mean, while lower values indicate the opposite.


With registration

Figure 4 shows the two principal components for each scenario for the registered curves.

Figure 4: First two principal components for each scenario for PFT Tundra for registered B-spline respresentation.
Figure 4: First two principal components for each scenario for PFT Tundra for registered B-spline respresentation.

We can see that the effects are less pronounced than in the unregistered case, which is not surprising since the registration process removes some of the variability in the data. Still, the outcome does not make any sense regarding the range of possible values.


With VARIMAX rotation

For a better understanding and an easier interpretation of the principal components, a VARIMAX rotation is applied. This rotation algorithm may reveal more meaningful components of variation in the data (Ramsay et al. (2009)).

Figure 5 shows the VARIMAX rotated first and second principal components for each scenario.

Figure 5: First two principal components for each scenario for PFT Tundra with VARIMAX rotation.
Figure 5: First two principal components for each scenario for PFT Tundra with VARIMAX rotation.

In this case, the VARIMAX rotation does not substantially increase the interpretability, since the dynamics of the first two principal components are hardly changed. Thus, a rotation might be unnecessary.


Cluster detection: PC1 vs. PC2 - Tundra

In order to detect possible clusters in the data, i.e. the share of above ground carbon may behave in similar ways for several grid points, the two first principal components are plotted against each other for all three considered cases: unrotated and unregistered (Figure 6), unrotated and registered (Figure 7) and VARIMAX rotated and unregistered (Figure 8). The color reflects a rough classifying into regions, here continents.

Figure 6: First principal component vs. second principal component for each scenario for PFT Tundra.
Figure 6: First principal component vs. second principal component for each scenario for PFT Tundra.

In Figure 6, a clear cluster forming is visible for all four scenarios. Again, the three warming scenarios tend to follow a similar pattern with two clearly distinguishable clusters. For the control scenario, the data is more scattered.


Figure 7: First registered principal component vs. second registered principal component for each scenario for PFT Tundra.
Figure 7: First registered principal component vs. second registered principal component for each scenario for PFT Tundra.

In the registered case, displayed above, again, clusters are formed, and again, these clusters are rather similar for the three climate scenarios. The scattering of the control scenario is also visible here. But overall, these results should not be overrated regarding the bad fit to the initial data.


Figure 8: First VARIMAX rotated component vs. second VARIMAX rotated component for each scenario for PFT Tundra
Figure 8: First VARIMAX rotated component vs. second VARIMAX rotated component for each scenario for PFT Tundra

The results for the rotated FPCA are rather similar to those of the unrotated one (which is not surprising since there are only small differences between the principal components themselves in Figures 3 and 5, respectively.)

The question arises where the respective grid cells are located on the world map. Figure 9 shows the locations for unrotated SSP5-RCP8.5.

Figure 9: Locations of the above cluster in Figure 6 for SSP5-RCP8.5 for PFT Tundra.
Figure 9: Locations of the above cluster in Figure 6 for SSP5-RCP8.5 for PFT Tundra.

Clearly, no geographical pattern is visible, so this cannot be the source of similar share of Tundra.

In order to evaluate if the same clustering pattern is present in the other four PFTs, first, the data for Tundra is classified into four clusters (see Figure 10). Note that cluster 4 is not present in the climate scenarios.

Figure 10: Clusters for unrotated PC1 vs. PC2 for PFT Tundra.
Figure 10: Clusters for unrotated PC1 vs. PC2 for PFT Tundra.

While the climate scenarios are mainly dominated by cluster 2, in the control scenario cluster 1 is dominant. In order to get an impression, where the respective grid cells are located on the map, Figure 11 shows the grid cells under consideration in the color of their clusters. As already expected, there is no clear spatial dependency.

Figure 11: Locations of the grid cells and their corresponding clusters for PFT Tundra.
Figure 11: Locations of the grid cells and their corresponding clusters for PFT Tundra.

Next, the unrotated PC scores for the other four PFTs are colored in the respective cluster. Figure 12 shows the PC scores for Needleleaf Evergreen.

Figure 12: Clusters for unrotated PC1 vs. PC2 for PFT Needleleaf Evergreen.
Figure 12: Clusters for unrotated PC1 vs. PC2 for PFT Needleleaf Evergreen.

One can possibly see some grouping within the clusters for this PFT as well. Note that this plot does not indicate a clear clustering structure for PFT Needleleaf Evergreen. Figure 13 shows the clusters for PFT Pioneering Broadleaf.

Figure 13: Clusters for unrotated PC1 vs. PC2 for PFT Pioneering Broadleaf.
Figure 13: Clusters for unrotated PC1 vs. PC2 for PFT Pioneering Broadleaf.

Also here, some grouping is visible, but due to the dominance of cluster 1 in the control and cluster 2 in the climate scenarios, the Tundra clusters do not entirely match the indicated Pioneering Broadleaf clusters.

Figure 14 shows the same for Temperate Broadleaf. Initially, no clear clustering pattern is detectable in the PC scores and thus, also the Tundra clusters do not really transfer to these scores.

Figure 14: Clusters for unrotated PC1 vs. PC2 for PFT Temperate Broadleaf.
Figure 14: Clusters for unrotated PC1 vs. PC2 for PFT Temperate Broadleaf.

Finally, Figure 15 shows the first two principal components plotted against each other for the remaining PFT other Conifers. Also here, the initial data does not provide a clear clustering structure as for Tundra. But the grouping of the Tundra clusters is present, so we can deduce some kind of dependence here.

Figure 15: Clusters for unrotated PC1 vs. PC2 for PFT Other Conifers.
Figure 15: Clusters for unrotated PC1 vs. PC2 for PFT Other Conifers.

Questions here:

  1. How can the clustering be explained? The location is clearly not the crucial part here.
  2. How could we interpret the plots, if there is no natural clustering structure visible?

Spatial Distribution visualized in world maps

In order to get a better understanding of the spatial component of the data, Figure 16 shows how the portion of above ground carbon from year 0 to 100 after disturbance develop in each grid cell (patch 1).

Figure 16: Spatial distribution of the portion of above ground carbon after distrubance for each scenario for PFT Tundra.
Figure 16: Spatial distribution of the portion of above ground carbon after distrubance for each scenario for PFT Tundra.

We can clearly see major differences between the scenarios: while the portion rapidly shrinks in the worst case scenario SSP5-RCP8.5, Tundra stays dominant in the control scenario for about 50 years in the majority of locations.


Possible next steps: